NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

On the use of real-world datasets for reaction yield prediction

https://doi.org/10.1039/d2sc06041h

Saebi, Mandana; Nan, Bozhao; Herr, John E.; Wahlers, Jessica; Guo, Zhichun; Zurański, Andrzej M.; Kogej, Thierry; Norrby, Per-Ola; Doyle, Abigail G.; Chawla, Nitesh V.; et al (March 2023, Chemical Science)

The lack of publicly available, large, and unbiased datasets is a key bottleneck for the application of machine learning (ML) methods in synthetic chemistry. Data from electronic laboratory notebooks (ELNs) could provide less biased, large datasets, but no such datasets have been made publicly available. The first real-world dataset from the ELNs of a large pharmaceutical company is disclosed and its relationship to high-throughput experimentation (HTE) datasets is described. For chemical yield predictions, a key task in chemical synthesis, an attributed graph neural network (AGNN) performs as well as or better than the best previous models on two HTE datasets for the Suzuki–Miyaura and Buchwald–Hartwig reactions. However, training the AGNN on an ELN dataset does not lead to a predictive model. The implications of using ELN data for training ML-based models are discussed in the context of yield predictions.
more » « less
Full Text Available
Efficient modeling of higher-order dependencies in networks: from algorithm to application for anomaly detection

Saebi, Mandana and (October 2021, EPJ data science)

Complex systems, represented as dynamic networks, comprise of components that influence each other via direct and/or indirect interactions. Recent research has shown the importance of using Higher-Order Networks (HONs) for modeling and analyzing such complex systems, as the typical Markovian assumption in developing the First Order Network (FON) can be limiting. This higher-order network representation not only creates a more accurate representation of the underlying complex system, but also leads to more accurate network analysis. In this paper, we first present a scalable and accurate model, BuildHON+, for higher-order network representation of data derived from a complex system with various orders of dependencies. Then, we show that this higher-order network representation modeled by BuildHON+ is significantly more accurate in identifying anomalies than FON, demonstrating a need for the higher-order network representatio
more » « less
Full Text Available
Efficient modeling of higher-order dependencies in networks: from algorithm to application for anomaly detection

https://doi.org/10.1140/epjds/s13688-020-00233-y

Saebi, Mandana; Xu, Jian; Kaplan, Lance M.; Ribeiro, Bruno; Chawla, Nitesh V. (December 2020, EPJ Data Science)
null (Ed.)
Full Text Available
Environment and shipping drive environmental DNA beta‐diversity among commercial ports

https://doi.org/10.1111/mec.16888

Andrés, Jose; Czechowski, Paul; Grey, Erin; Saebi, Mandana; Andres, Kara; Brown, Christopher; Chawla, Nitesh; Corbett, James J.; Brys, Rein; Cassey, Phillip; et al (March 2023, Molecular Ecology)

Abstract The spread of nonindigenous species by shipping is a large and growing global problem that harms coastal ecosystems and economies and may blur coastal biogeographical patterns. This study coupled eukaryotic environmental DNA (eDNA) metabarcoding with dissimilarity regression to test the hypothesis that ship‐borne species spread homogenizes port communities. We first collected and metabarcoded water samples from ports in Europe, Asia, Australia and the Americas. We then calculated community dissimilarities between port pairs and tested for effects of environmental dissimilarity, biogeographical region and four alternative measures of ship‐borne species transport risk. We predicted that higher shipping between ports would decrease community dissimilarity, that the effect of shipping would be small compared to that of environment dissimilarity and shared biogeography, and that more complex shipping risk metrics (which account for ballast water and stepping‐stone spread) would perform better. Consistent with our hypotheses, community dissimilarities increased significantly with environmental dissimilarity and, to a lesser extent, decreased with ship‐borne species transport risks, particularly if the ports had similar environments and stepping‐stone risks were considered. Unexpectedly, we found no clear effect of shared biogeography, and that risk metrics incorporating estimates of ballast discharge did not offer more explanatory power than simpler traffic‐based risks. Overall, we found that shipping homogenizes eukaryotic communities between ports in predictable ways, which could inform improvements in invasive species policy and management. We demonstrated the usefulness of eDNA metabarcoding and dissimilarity regression for disentangling the drivers of large‐scale biodiversity patterns. We conclude by outlining logistical considerations and recommendations for future studies using this approach.
more » « less

Search for: All records